feat: Add SmolLM2 browser-based LLM inference via WebAssembly by konard · Pull Request #2 · link-assistant/model-in-browser

konard · 2025-12-29T12:45:58Z

Summary

This PR implements a proof-of-concept for running SmolLM2 language model entirely in the browser without any server-side processing, using WebAssembly for ML inference.

Key Features

Browser-based LLM Inference: SmolLM2-135M-Instruct model runs entirely client-side via WebAssembly
Rust/WASM Core: Uses HuggingFace Candle ML framework compiled to WebAssembly for efficient inference
Web Worker Architecture: Model inference runs in background worker, keeping UI responsive
React Chat UI: Modern chat interface using @chatscope/chat-ui-kit-react
Streaming Responses: Token-by-token streaming for real-time AI responses
GitHub Pages Deployment: Automatic deployment workflow

Technical Implementation

wasm/ - Rust library compiled to WebAssembly
- Uses candle-core, candle-nn, candle-transformers for SmolLM2 inference
- Tokenizer support via HuggingFace tokenizers crate
- Streaming token generation with callback
web/ - React/TypeScript frontend
- Vite-based build system
- Web Worker for background model processing
- Progress tracking for model downloads (~270MB)
server/ - Local Axum dev server for testing
- Serves static files and WASM with proper MIME types
- CORS support for development

Files Changed

Added WASM inference library in wasm/
Added React chat UI in web/
Added local dev server in server/
Added GitHub Pages deployment workflow
Updated root Cargo.toml for workspace

Test Plan

WASM compilation succeeds with all required features (bulk-memory, SIMD, etc.)
CI/CD Pipeline passes (lint, tests on Linux/macOS/Windows)
GitHub Pages build workflow passes
TypeScript compilation succeeds without errors

Manual Testing

Run ./scripts/dev.sh to start local server
Open http://localhost:3030 in browser
Click "Load Model" button
Wait for ~270MB model download
Send a message and observe streaming response

Closes #1

🤖 Generated with Claude Code

Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: #1

This PR implements a proof of concept for running the SmolLM2-135M language model directly in the browser using WebAssembly, with no server-side processing. Key features: - Rust WASM library using Candle ML framework for model inference - Web Worker for background processing to keep UI responsive - React chat UI using @chatscope/chat-ui-kit-react - Local Rust development server with CORS support - GitHub Pages deployment workflow - Streaming token generation for real-time responses Architecture: - wasm/: Rust WASM bindings for SmolLM2 inference - web/: React frontend with TypeScript - server/: Local development server with Axum Fixes #1 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…pace This fixes the wasm-pack build error where it thought it should be part of the parent workspace. An empty [workspace] table explicitly declares this package as its own standalone workspace. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The getrandom 0.3.x crate requires explicit configuration for the wasm32-unknown-unknown target. This adds: - .cargo/config.toml with rustflags to enable wasm_js backend - Updated Cargo.toml comments explaining the configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Both getrandom 0.2.x and 0.3.x are needed by different dependencies. Added explicit dependency on getrandom 0.3 with wasm_js feature to ensure proper WASM compilation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Added Cache field to SmolLM2Model struct - Initialize cache during model loading - Pass cache to forward() method as required - Fix dim() error handling with explicit map_err 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

These features are required for the wasm-bindgen output to be valid. Bulk memory operations are used by the generated WASM code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wasm-pack doesn't always respect .cargo/config.toml settings. Setting RUSTFLAGS environment variable directly in the workflow ensures the bulk-memory feature is enabled during compilation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Rust 1.87 / LLVM 20 generates bulk memory operations by default for wasm32-unknown-unknown targets. This causes wasm-opt to fail validation with error: "Bulk memory operations require bulk memory [--enable-bulk-memory]" Add wasm-pack profile configuration to pass --enable-bulk-memory and --enable-mutable-globals flags to wasm-opt during the optimization step. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Candle ML framework uses SIMD operations for optimized tensor computations. Add --enable-simd flag to wasm-opt and +simd128 target feature to Rust compiler flags to properly support these operations in WebAssembly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Rust 1.87+ / LLVM 20 generates modern WebAssembly features by default: - nontrapping-float-to-int: For i32.trunc_sat_* saturating conversions - sign-ext: For sign extension operations - reference-types: For reference type operations Add these flags alongside existing bulk-memory, mutable-globals, and simd flags to pass wasm-opt validation successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

TypeScript's strict mode requires an intermediate cast to unknown when converting between types that don't overlap. The WASM module's generated types don't exactly match our SmolLM2Wasm interface. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

This reverts commit 4f47f7c.

konard · 2025-12-29T13:49:28Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

Public pricing estimate: $13.889302 USD
Calculated by Anthropic: $10.940544 USD
Difference: $-2.948758 (-21.23%)
📎 Log file uploaded as GitHub Gist (1275KB)
🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

Initial commit with task details

4f47f7c

Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: #1

konard self-assigned this Dec 29, 2025

konard and others added 10 commits December 29, 2025 14:01

konard marked this pull request as ready for review December 29, 2025 13:48

Revert "Initial commit with task details"

371fcfb

This reverts commit 4f47f7c.

konard merged commit 07b3168 into main Dec 30, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add SmolLM2 browser-based LLM inference via WebAssembly#2

feat: Add SmolLM2 browser-based LLM inference via WebAssembly#2
konard merged 13 commits intomainfrom
issue-1-bef2b5bd2f4e

konard commented Dec 29, 2025 •

edited

Loading

Uh oh!

konard commented Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

konard commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Technical Implementation

Files Changed

Test Plan

Manual Testing

Uh oh!

konard commented Dec 29, 2025

🤖 Solution Draft Log

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

konard commented Dec 29, 2025 •

edited

Loading